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CONCENTRATION OF NORMALIZED SUMS AND A CENTRAL 
LIMIT THEOREM FOR NONCORRELATED RANDOM 

VARIABLES * 1 

By Sergey G. Bobkov 
University of Minnesota 

For noncorrelated random variables, we study a concentration 
property of the family of distributions of normalized sums formed by 
sequences of times of a given large length. 

1. Introduction. Let X = (X \,..., X n ) be a vector of n random variables 
on a probability space (O, P) such that, for all i,j = 1,..., n, 

(1.1) E X i X j = 5 ij , 

where 8 % .j is Kronecker’s symbol. Given a positive integer k <n, denote by 
Gnk the family of all collections of indices t = {ii,...,*^} of size k with 
1 < i\ < ■ ■ ■ < ik < n. To every r £ Q n ,k we associate a normalized sum 

Q _ x h H-f X ik 

and a corresponding distribution function F T (x) = P{SV < x}, x £ R. In this 
paper we show that, when A; is a large fixed number, most of the random 
variables S T are “almost” equidistributed, that is, most of TV’s are close to 
the average distribution function 

(1-2) T(x) = -^7V(x), 

r 

where C!f = card^^) = k \^L k y stands for the usual combinatorial coeffi¬ 
cients. To study the rate of closeness, we use the Levy distance L(F T ,F), 
which is defined to be the inhmum over all <5 > 0 such that F(x — S) — 5 < 
F t (x) < F{x + 8) + 5 for all x £ R. In terms of the normalized counting 
measure y = y nt k on Q n ,ki we have: 
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Theorem 1.1. Under (1.1), for all 5 > 0, 

(1.3) /x{r: L(F t , F) >5} < Cfc 3/4 exp(—cM 8 ), 

where C and c are certain positive numerical constants. 


The property that, for a growing number of summands k, many F T ’s 
approximate a “center” F may be viewed as a weak kind of a central limit 
theorem. In general, however, the center F essentially depends on k and the 
distribution of the underlying sequence X. 

An analogous concentration property has been intensively studied in a 
number of related randomized models. In a seminal work (year?), as an ap¬ 
plication of the isoperimetric theorem on the sphere, Sudakov established 
a concentration property of distributions of the weighted sums )Cj=i OjXj 
provided that the weights 9j are randomly chosen as coordinates of a point 
on the unit Euclidean sphere in R n (with respect to the uniform measure on 
the sphere). A different approach in the case of normalized Gaussian weights 
was suggested by von Weizsacker (year? ). Quantitative versions with refine¬ 
ments for the rate of concentration in the case of log-concave random vectors 
X were obtained in (year?), (year?); see also (year?), (year?), (year?). Mul¬ 
tidimensional random projections of X were considered by Naor and Rornik 
(year? ) , who essentially used a concentration inequality on the Grassmanian 
manifold. 

As it turns out, the weights can be restricted to the form 9j = (cf. 
(year?)). As well as on the sphere, the latter model uses a specific dimension- 
free concentration property on the discrete cube. Similarly, under the con¬ 
ditions of Theorem 1.1 we are dealing with certain weights, namely of the 
form 

= ^=, 1 <j<n, 

where the sequence ( Sj ) contains exactly k l’s and n — k 0’s. With respect 
to the previous examples, this model seems to be closest to the classical, 
nonrandomized version of the central limit theorem, since only usual sums 
of data Xj are taken into consideration. The concentration property (1.3) 
thus tells us that the resulting sum does not depend, in essence, on the 
concrete times when the observations are made. 

Moreover, under certain natural assumptions on random variables Xj, 
the average distribution F must be close to the standard normal distribu¬ 
tion function d>. Namely, suppose we have an infinite sequence of random 
variables Xj that satisfy the orthogonality condition (1.1). 


Theorem 1.2. Let E Xj = 0 and sup^ E| Xj\ 3 < oo. Suppose that in prob¬ 


ability, as n —> oo, 


xl + --- + xl 


1 . 


n 
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Then for all (k, n) such that 1 -C k <C n, for every 6 > 0 and for all r £ G n ,k 
except for a set of p-measure at most Ck 3 ! 3 exp(— ck5 8 ), we have L(F T ,$) < 
6 + o(l). 

Here o(l) denotes a certain sequence e n> fc, independent of 5, which con¬ 
verges to zero for the indicated range of ( k,n ). 

The proofs of Theorems 1.1 and 1.2 are given in Sections 3 and 5. The 
proof of Theorem 1.1 relies on a concentration property of the measure y 
with respect to the canonical graph structure on G n ,k- We discuss this prop¬ 
erty separately in Section 2. Section 4 is devoted to one auxiliary inequality 
on elementary symmetric polynomials that is needed for Theorem 1.2. It is 
also applied in Section 6 to study the asymptotic normality of normalized 
sums for finite exchangeable sequences. 

2. Concentration on slices of the discrete cube. In this section it is con¬ 
venient to identify G n ,k with the subset of the discrete cube, so let us redefine 
it as 

Gn,k = {x = (aq,.. ■ ,x n ) £ {0, l} n :x 1 -|- b x n = k}. 

From the discrete cube, G n ,k inherits the structure of a graph: Neighbors are 
couples of the points which differ exactly in two coordinates. We equip Q n k 
with the metric 

p(x, y) = \ cardji < n : x t / y;}, x, y £ g n>k , 

which is one half of the Hamming distance. Every point x £ Gnk has k(n — k ) 
neighbors {su x }*e/(x),jeJ(x) parametrized by 

I (x) = {i < n: Xi = 1}, J(x) = {j <n:xj = 0}. 

Namely, ( Sijx) r = x r for r ^ i,j and (sy-s); = Xj, (sijx)j = x t . 

For every function / on Q n k and a point x in C/n.fc, the discrete gradient 
V/(x) represents a vector in the Euclidean space x of dimen¬ 

sion k(n — k) with coordinates (f(x) — /(•su x ))*6/(z),jeJ(a;)- It has Euclidean 
length |V/(x)| given by 

|V/(x)| 2 = 5Z I f{x)-f{y)\ 2 = \f( x )~ f( s ij x )\ 2 - 

p(x,y)=l i£l{x) j£j(x) 

In 1987, Diaconis and Shahshahani (year?), using a group representation 
approach, derived a remarkable inequality of Poincare-type on this graph: 

(2.1) J f 2 dfi- fdyj <lj\Vf\ 2 dp. 

Note that the constant on the right-hand side can be chosen independently 
of k. Actually, for the quadratic form ( Qf,f) = f |V/| 2 cfyz in T 2 (f?n,fc, h)i & h 


4 


S. G. BOBKOV 


eigenfunctions and eigenvalues are known. As emphasized in (year?), first 
they were essentially determined without using group theory by Karlin and 
McGregor (year?). In particular, with our notations (2.1) becomes equality 

for all linear functions f(x) = a\X\ +-b a n x n . 

If |V/| is bounded by a constant, say, a (such functions may be viewed 
as Lipschitz with Lipschitz seminorm at most a), then by (2.1), Var^(/) < 
^■cr 2 . This already shows that Lipschitz functions are strongly concentrated 
around their /i means E^/ = f f dfi. Applying (2.1) to functions of the form 
e** and properly iterating over small t, we arrive at a much better estimate, 

(2.2) Ml / — E m /| >h}< (j e ~ c Vnh/a^ h>Q 

up to some numerical positive constants C and c. The property that Poincare- 
type inequalities imply exponential bounds on the tails of Lipschitz functions 
was first observed by Gromov and Milman (year?) (in the context of Rieman- 
nian manifolds) and by Borovkov and Utev (year?) (for probability measures 
on the real line). Afterward it was intensively studied in the literature; see 
(year?) for an extension to the graph setting or (year?) for an account of 
the question. 

Although it is not possible to sharpen (2.2) on the basis of (2.1), we 
may wonder, in analogy with the usual discrete cube, whether a stronger 
Gaussian bound such as 

(2.3) g{\f — E^/| > h} < Cexp(— cnh 2 /a 2 ), h > 0, 

holds in the case of the graph Q n [.. As is well known, in general, such an 
improvement can be reached by virtue of a logarithmic Sobolev inequality. 
An important step in this direction was made by Lee and Yau (year?). They 
proved that, for every real-valued function / on G n .k, 

(2 .4) E „ V(/2) <kMA)| |v/|>^ 

where C is a numerical constant and where we assume for simplicity of 
notations that k < j . (A little weaker inequality with factor log n in the place 
of log^ was earlier obtained in (year?).) Here and elsewhere, the entropy 
functional is defined by 

Ent(g) = Eglogg — Eg log Eg, g > 0. 

Thus, when k is proportional to n, say, of order ^, the additional logarithmic 
term log ^ vanishes and then the logarithmic Sobolev inequality (2.4) rep¬ 
resents an improvement, up to a factor in the constant, of the spectral gap 
inequality (2.1) and implies, in particular, the Gaussian deviation inequality 
(2.3). 

As for the range k = o(n), we have to keep in mind that the constant 
on the right-hand side of (2.4) is asymptotically sharp. Therefore, to reach 



CONCENTRATION OF DISTRIBUTIONS 


5 


(2.3) for the whole range, we need a different argument, and it appears that 
a modified form of (2.4) may still be used: 

Theorem 2.1. For every real-valued function f on G n ki 

(2.5) (n + 2)Ent At (e / ) <£(e f ,f) < J\Vf\ 2 e f dp,. 

In particular , if |V/| < a, 

(2.6) //{|/-E M /| > h} < 2exp(-(n + 2)h 2 /(4<7 2 )), h > 0. 

The Dirichlet form that appears in the middle of (2.5) is defined canoni¬ 
cally by 

£(f,g) = j (' s Jf(x),Vg(x))dn(x) 

= 1 J2 (/(t)- f(y))(g(x)- g(y))dfi(x), 

p(x,y)=l 

where / and g are arbitrary functions on Q n \-. The estimate (2.6) is obtained 
from (2.5) by applying the latter to functions tf: It then yields a distribu¬ 
tional inequality (n + 2)Ent At (e i ^) < a 2 t 2 , which is known to imply the 
bound 

E M exp(t(/ — E^/)) < exp(<j 2 t 2 /(n + 2)), t £ R, 

on the Laplace transform of / (an argument due to Ledoux (year?)). 

The second inequality in (2.5) holds true for the uniform probability mea¬ 
sure on an arbitrary finite undirected graph, due to the elementary estimate 
(a — b)(e a — e b ) < (a — b) 2 (e a + e b )/2, a, b G R. As for the first inequality in 
(2.5), it comes naturally in the Markov chain setting in connection with the 
problem on the rate of convergence to the stationary distribution. In the 
case of Q n ,ki h was recently proved (year?) in a little more general form by 
interpolating between the Poincare and the modified log-Sobolev inequal¬ 
ity, and independently (year?) where a martingale approach was used to 
get an asymptotically equivalent constant on the left-hand side of (2.5). For 
more details and discussions of that inequality, we also refer the interested 
reader to (year?). Here, for the sake of completeness and to emphasize the 
“concentration” content, we include below a direct inductive argument. 

Proof of Theorem 2.1. For 1 < k < n — 1, let A nt k denote the best 
constant in 

(2.7) Ent M (/) < A ntk £(f,log f) = R (f( x )J(,V )), 

° n P(®,y)=i 
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where / is an arbitrary positive function on Q = G n . k , R{a, b) = (a — b) (log a — 
log b), for a, b > 0 and the summation is performed over all ordered pairs 
(x,y) <E G x G such that p(x,y) = 1. By symmetry, A n ^ k = A n)n _ k . 

When k = 1, g represents a graph of size n where all different points 
are neighbors of each other (a complete graph). In this case, by Jensen’s 
inequality, 

Ent M (/) < cov M (/,log/) = ^2 Y R (f( x )’f(y)) = /)■ 

x^y 

Hence, A Ut i < As for k > 2, we deduce a recursive inequality that relates 
A Uyk to H n _and then we may proceed by induction. Thus, fix k > 2 
and a positive function / on with / f dp = 1 [this can be assumed in view 
of the homogeneity of (2.7)]. Introduce subgraphs 

Gi = {x £ g: Xi = 1}, 1 < i < n, 

and equip them with uniform probability measures pi. Since all Gi can be 
identified with G n _i k -i, we may write the definition (2.7) for these graphs: 


I / log / dpi 
Qi 


< / fdpi log / fdpi + 

JQi J Qi 


An r k-i 1 Y Y R(f(x)J(y)). 

n-l xGSiy£Gi,p{x,y)=l 


Put di = J f dpi. Summing the above inequalities over all i < n with weight 
2- and making use of E Ya=i ! l i = P, we get 


( 2 . 8 ) 


f log f dp 

1 v—. A n _i k —i 

<-^2 a iiogcii + 


n 


i =1 


nC, 


k -1 
n —1 


YY Y R (f( x )J(y))- 

i= l y£Gi,p(x,y)= l 


Since 2. ^ ? n =] cii = f f dp = 1, the first term in (2.8) is estimated from above, 
according to the case k = 1 in (2.7), by (-A^i/C*) R(cn,a,j). Hence, 
(2.8) implies 

A A 

En V(/) < H^(«^Oj) + ""fc-T 1 H H H R(f(x)J(y)). 

v£j n n ~ 1 *=i xeSi y&Gi,p{x,y)=i 


Now, given x,y eG with p(x,y) = 1, the number of all i such that x £ Gi 
and y & Gi simultaneously is equal to k — 1. Hence, the triple sum contributes 

(k -1) J2 Y R (f( x )J(y))=(k- Dialog/). 

x£S yeG,p(x,y)=l 
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Since ({k - 1 )C*)/{nC*_\) = we thus get 


(2.9) Ent^/) < —— R{o,i, cij) + 


n 




(fe 1) 

k 


f(/,log/). 


To treat the sum in (2.9), note that, for each couple i / j, the 

map Sjj:{0,l} n —> {0, l} n acts as a bijection between Qi and Qj. pushing 
p t forward onto fij (whenever k> 2). In particular, aj = f f(y)dpj(y ) = 
/ f(sijx) dfj,i(x). Hence, by convexity of R in the positive quarter a, b > 0 
and Jensen’s inequality, 

R(ai,aj) = r(^J f(x)dy,i(x),J f(sijx) dpi(: 

< f R(f(x),f(sijx))dni(x). 


Therefore, 


( 2 . 10 ) 


J2 R (ai,aj) £ R(f(x),f(.s ij x)). 

i^j n -1 i^j x&Gi 


Note that y = SijX always implies p(x,y ) < 1 and in the case x G Qi, the 
equality p(x,y) = 1 is only possible when x, = 1, xj = 0. Hence, the double 
sum in (2.10) contains only terms R(f(x),f(y)) with p(x,y ) = 1 [the cases 
p(x,y) = 0 can be excluded]. In turn, for any couple x,y G Q such that 
p(x,y) = 1, there is a unique pair (i,j) such that i ^ j and y = 8{jX. Thus, 
the right-hand side of (2.10) turns into 


1 


-£ £ 


n 


r f 

n—1 x£Q y£Q,p(x,y)=l 

and we finally get, from (2.9), 


R(f(x),f(y)) = (f,log f) 


t-, , , r\ ^ l + (k ~ l)H. n _i fc_i 

Ent M (/) < —i- - - 1 -£(/,log/). 

Hence, A njfc < \(A n j_ + (k- l)A„_i )fc _i), or B U)k < A n> i + 5 n -i,fc-i in terms 
of B n k = kA n>k - Applying this inequality successively k — 1 times and re¬ 
calling that A Tt i < t 4, we arrive at 


Rn,k 


+ 


2n 2(n — 1) 


+ 


2(n — (k — 2)) 2(n — {k — 1)) 


If k < each of the above k terms does not exceed so 

This yields the desired estimate A n k < In the case k > we have 

A n ^ k = A U}n _k, and Theorem 2.1 follows. □ 
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3. Proving Theorem 1.1. We turn to the proof of Theorem 1.1 and to 
the original definition of Q n ^k as a collection of all subsets of {l,...,n} of 
cardinality k. We always assume the basic orthonormal hypothesis (1.1) on 
the sequence X\,, X n . 

First we focus on the concentration property of the family {F T } in terms 
of their characteristic functions 

fr{t) = Ee itS X 

viewed as complex-valued functions on Q n ^ with parameter t. As a second 
step, concentration of values of f T (t) around its g, mean, 

r r -boo 

f(t)= fr{t)d(j,(T)= e ltx dF(x), 

J J —OO 

is converted, with the help of standard facts from Fourier analysis, into 
the concentration property of distributions in the form (1.3). Although this 
route is different than that in (year?) or (year?) for the case of the sphere, 
it has proved to work well on the discrete cube (year?) (see also (year?)). 

Lemma 3.1. For every t € R, the function r —► / T (f) has gradient on 
G n ,k satisfying 

|V/ r (t)| < (\t\ + t2 )\jj;i reGn tk - 


Proof. Every r in Gnk has k(n — k) neighbors in Q n 

Tu,v = (t\ {?/}) U {u}, u€t,v^t. 

Hence S T - S Tuv = ( X u - X v )/Vk and 

frit) - fr UtV (t) = Eexp(zt5 r )(l - exp (~it(X u - X v )/Vk)). 

Given a complex-valued function g on G n ,k, we apply the equivalent rep¬ 
resentation for the modulus of gradient, 

\Xg(r)\ =sup J2J2 au ’v(9{r) - g(r u ,v)) , 

wGt v(£t 

where the supremum runs over all collections of complex numbers a u<v such 
that l a '!i,u| 2 = 1- In particular, for g(r) = f T (t) we have 


|V/ e (t)| =sup 
< sup E 


E exp(itS T ) EE*., „(1 - exp (-it(X u - X v )/Vk)) 

UEr v(£t 


EE°«, u(l - exp (~it(X u - X v )/Vk)) 

U&Ty£ r 
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Using the estimate \e ia — 1 — ia\ < \a 2 (a £ R), the assumption E(X U — 
X v ) 2 = 2 and the identity supX^er Yv^t \ a u,v\ = we can con¬ 

tinue to get 


|V/ r (t)|<i|supE 


uEt v(£t 


t Z 

+ — sup E EE 

uer v d r 


= E supE 

\/Sfc 


EEM^«-^) 

u£t v £ t 


+ t -Z^k{n~k). 


k 


To treat the last double sum, introduce b u = Yv£t a u,v and c v = Yu&r a u,v, 
so we can write 


By (1.1), 


E E a uA X n - Xv) = E b nXn ~ E C - X v 

u£t v £ t u£t v £ t 


^ ^ ^ ^ dlL,v(X u 

-X v ) 

2 

= E 

^ ^ buX u 

2 

+ E 


uer v g T 



U£T 


V(jiT 


but by Cauchy’s inequality, 


= EN 2 + EM 2 

uer v £ r 


\bu\ — ^ ^ I ^ U,V I ? I Cy | ~ E k ^ ^ | &U,V I ? 

V(£t uEt 

so Suer \ku\ 2 + E^ r |cu[ 2 — (n — k) + k = n. Therefore, once more by Cauchy’s 
inequality, 


E 


EE ®U,t) (X u 


uGt , 



< \/n. 


Thus, we arrive at the bound |V/ e (t)| < 1^,^ + 1 2 \j which finishes the 
proof. 

□ 


Corollary 3.2. For every t > 0 and h > 0, 





MzMhJ 


< 4 exp 


-fc/i 4 \ 
8(2 + h) 2 )' 


(3-1) 
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Proof. Indeed, if t > the probablity on the left-hand side is zero, 
since |/ T (f) — f(t)\ <2. In the other case t < j-, consider the function g(r) = 
( f T (t ) — f(t))/t. It has //-mean zero, and according to Lemma 3.1, its mod¬ 
ulus of gradient is bounded by (1 + 1 )< (1 + §■ The same is true for 
real and imaginary parts g\ = Re g and g 2 = Ini g. Thus, we are in a position 
to apply Theorem 2.1 which gives [replacing n + 2 with n in (2.6)] 

MM >h} = MM 2 + \g 2 \ 2 > h 2 } 

< Mis'll > h/V2} + n{\g 2 \ > h/V2} 

< 4exp(— kh 2 /8(1 + 2/h) 2 ). 

Corollary 3.2 follows. □ 


By continuity, inequality (3.1) continues to hold in the limit case t = 0. 
Since 

^S T = f S r dll(T) = ± J2 Xh+ ^ Xik 


where X = (X\ + • • • + X n )/n , the limiting case becomes 
(3.2) /r{r : |E5 r — VkEX\ >h}< 4exp(—fc/i 4 /8(2 + h) 2 ). 


Thus, under (1.1), the function g{r) = E S T on Q n ^ is strongly concentrated 
around its mean E ^g = s/k'EiX. 

For the next step, it is important to sharpen inequality (3.1) by making 
it uniform with respect to the parameter t. In other words, we need to 
control sup 4>0 (|/ r (f) — f{t)\)/t. This can be achieved at the expense of a 
small deterioration of the bound on the right-hand side of (3.1). Indeed, let 
us apply (3.1) to points t r = rh 2 , r = 0,1,..., N = [p.] + 1, where [ • ] stands 
for the integer part of a real number and where the case r = 0 is understood 
as the inequality (3.2). Then we get 


u< max 
(0<r<iV 


\frit r ) ~ f{t r )\ 


N 


>h\ 

r =0 


l/r(*r) ~ f{U)\ 


>h 


(3.3) 


/ —kh A 

4<JV + 1)eXp (,8(2T W 

2 _\ / -kh A 


< 


- 4 ! h? + 2 ) exp 


.8(2 + h) 2 J' 


To involve all remaining values of t > 0 in the maximum on the left-hand 
side, we may assume, as in the proof of Corollary 3.2, that 0 < t < p Let 
G{h) denote the collection of all r e Q n ^ such that \f T (t r ) — f(t r )\/t r < h 
for all r = 0,1,..., IV simultaneously. Recall that E5^ = 1, so |/(-(t)| < 1 and 
< 1, and similarly for /. 
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Case 1. 0 <t <h. By Taylor’s expansion, 

— — —- = iE(5 r - VkX) + t j (1- v){f"{tv) - f"(tv )) dv. 
t jo 

Hence, if r E Q(h ) and in particular |E (S T — )| < h, we get 


l/r(t)-/(«)! 

t 


<h + t<2h. 


Case 2. h<t< j i . Pick an index r = 0,..., N — 1 such that t r <t< t r+ \. 
Recalling that t r+ \ — t r = h 2 and applying the Lipschitz property of f T and 
/, we may write 

I fr{t) - fit) | < | frit) ~ f T (tr) I + I fr(t r ) ~ f (t r ) | + | f (t r ) ~ f (t) \ 

< 2| t — t r | + t r h < 2 h 2 + t r h < 2 h 2 + th < 3th. 

The assumption t>h was used on the last step. 


Thus, in both cases we obtain that r E G(h) implies sup t>0 (|/ T (f) 
3/i. Consequently, by (3.3), 


(3.4) y. 


, P MiM 

t>0 t 



<4 



exp 


—kh A 

8(2+ /i) 2 


This is a desired sharpening of (3.1). 


/(0I/0 

h> 0. 


Proof of Theorem 1.1. We use the following observation due to 
Bohrnan (year?). Given characteristic functions <pi and (p 2 of the distri¬ 
bution functions T\ and F 2 , respectively, if — <^ 2 (01 — ^ for t>0, 

then, for all x G R and a > 0, 

2A 2A 

Fi(x -a) - < F 2 (x) < F\ (x + a) -|-. 

a a 

The particular case a = \/2A gives an important relationship, 

(3.5) -L{F 1 ,F 2 ) <sup---, 

2 t> 0 t 

between characteristic functions and the Levy distance. Therefore, by (3.4) 
and (3.5), 

0 Ff > 3fc} < 4(4 + 2 ) exp(gj^L_) • 

Replacing 6 h with S 2 and noticing that only 0 < <5 < 1 should be taken into 
consideration, we arrive at the estimate 

h{L{F t ,F) > <5} < exp(—cfcd 8 ), 


5>0 
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with some positive numerical constants C and c. On the other hand, in 
the latter inequality, we may restrict ourselves to values 5 > ci&T 1//8 , which 
make the bound (C/(5 6 ) exp(—cM 8 ) smaller than 1, and then we arrive at 
the required inequality (1.3). Theorem 1.1 has been proved. □ 


4. Elementary symmetric polynomials. We turn to the next natural 
question regarding approximation of the averarge distribution function F. 
According to the definition (1.2), it has characteristic function 

(4.1) /(^IeE^)...^^), t£ R. 

with summation over all increasing sequences 1 < i\ < ■ ■ ■ < ik < n. To better 
understand possible behavior of such sums, introduce normalized elementary 
symmetric polynomials in n complex variables of degree k: 

a k( z ) = -L: z h"' z h > z=(z 1 ,...,z n ) €C n . 

n »i<—<»fc 

An account on basic results and some other interesting properties of such 
polynomials can be found in (year?). For our purposes, it is desirable to 
relate Ok to arithmetic means 

z\ + • ■ • + z n 

z =-. 

n 

In this section we derive the following statement of independent interest 
which seems to be absent from the literature (cf. also (year?) for a more 
general scheme). 


Proposition 4.1. If \zA <1, j = 1, ... ,n, then for all 1 < k < n, 


(4.2) 


Wk(z) - z k | < 6 


k- 1 
n — 1 


Since \zj \ < 1, both quantities satisfy |(^)| < 1 and \z k \ < 1, so \crk{z) — 
z k | < 2. We can easily rehne this bound by applying the polynomial formula 


z k 


1 k\ 

“ nk pi+ .^ pn=k Pl'---Pn'. 


~,P 1 'yP'n 

Z 1 z n 


klC n ,4 . , , , 

k (Jk\Z) + remainder(z) 


Then we obtain immediately the estimate 
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Here, the right-hand side gets small only in the range k = o(y/n) in which 
case it is of order k 2 /n. The bound of order | in (4.2) is asymptotically 
sharp, but its proof requires more sophisticated arguments. 


Proof of Proposition 4.1. Let A n denote maximum of the left- 
hand side in (4.2) over all possible vectors 2 with \zj\ < 1 for all 1 < j < n. 
and let B k be an optimal constant in 

k — 1 

A n k — Bk ”> Tl ^ k. 
n — 1 

We need a uniform bound on B k . The case n = 1 is trivial, since then A n ^\ = 
0. If n = 2, by simple algebra, 


so 


a 2 ( z ) -z 2 = - - - — J2(zj - z) 2 , 

n{n - 1) 

\a 2 {z) - z 2 \ < 1 jr \ z i ~ z l 2 ^ ——r(! - l^l 2 ) < 

n(n — 1) “ n — 1 

v ' 7 = 1 


n— 1 


Hence, A n $ < and B 2 < 1. To bound the remaining constants, we deduce 

recursive inequalities that relate A n>k to 1^-1 (and then we can argue 
as in the proof of Theorem 2.1). 

Thus let n > k > 3. With every z £ C n we associate n vectors in C n , 


Z(j) = (zi,...,Zj-i,z j+1 ,...,z n ), 1 <j<n. 

In what follows we always assume \zj\ < 1 for all 1 < j < n. Let us mention 
several simple immediate properties and identities: 


1. For all j < n, \z^ \ < 1. 

2. We have z^ — z = — (zj — z )/(n — 1). 

3. On the other hand, z^ — z = (z^ — Zj)/n, so we always have | z^ — z\ < 
2 

n * 

4. We have z = \ J2]=i %)■ 

5. We have a k (z) = ± TJj=i Zj a k-i(z(j))- 

From items 4 and 5 we obtain the representation 


a k (z) - z k 


1 n 1 n 

i=i 3 = 1 


— 2: 


A-l, 


Hence, 

(4.3) 


Cfc(z) - 


^ A n — l k— 1 “1“ 

n 


i=i 
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Thus, our task is to bound the last term on the right-hand side properly. One 
natural possibility is to use expansion z^ 1 — z k ~ 1 = (Z(j ) — z) J2u=o z^z k ~ u ~ 2 

Then by identity 3, | z^ 1 — z k ~ l \ < ■ Applying this estimate in (4.3), 

we arrive at 

„ 2(k — 1 ) 

(4.4) A n k<A n _ -. 

n 

Successive application of this inequality leads to the rough bound A ny k = 
0(k 2 /n). Nevertheless, (4.4) can be useful for small values of k. For example, 
if k = 3, we get 

„ „ 4 1 4 

An 3 < A n -12 H-< -- H- 

n n — 2 n 


5- 


3n — 8 


n-l\ n(n — 2 ) / n — 1 ’ 


< 


n > 3, 


so £>3 < Similarly, for k = 4, n > 4, by the previous step, 


(4.5) 


„ „ 6 1 4 6 

A n 4 < A n _i 3 -|— <-— H-- H— 

n n — 3 n — 1 n 


1 


n — 1 


11 - 


4?r - 18 


< 


12 


n(n — 3) / n — 1 ’ 

so £>4 < 4. Hence, £?& < 4 for k < 4, as stated in (4.2). 

Thus, assume n> k > 5. We need a more careful estimate of the right- 
hand side of (4.3) that is independent on k. From Taylor’s expansion in the 
integral form, with integration along a segment on the plane connecting two 
points a, ao £ C, we have a canonical estimate 

la * -1 - ag - 1 - (k - 1 )a k 0 ~ 2 {a - a 0 )| 

(k — l)(k — 2 ) 


< 


|a — ao| 2 m ax{|ao| , |a| fc d }. 


In particular, when ao = z, a = zr j), we may write, applying property 3, 


~k-l _ -k -1 
z (3) Z 


= (k - 1 - z)z k 2 + Qj 

for some \0j\ < 1. Hence, by statement 2, 


(k - l)(k — 2 ) 


1 %) - 


-.2 / i - 


2 + 


n 


k—3 


Y,Zj(z k J -z- 
3 = 1 


' 0 ') 


')=-A4s‘- 2 e^g- 

i=i 


n — 1 


+ 


(fc-l)(fc- 2 ) 
2 (n — l ) 2 
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However, J2 1 j=i z j( z j ~ z) = Yll=i{ z j ~ z ) 2 is bounded in absolute value by 
n(l — \z\ 2 ). Therefore, 


(4.6) 


1 

n 


n 





< 


k- 1 
n — 1 


z\ k ~ 2 (l - |7| 2 ) 


(4.7) 


+ 


(k- l)(k-2) 
2(n — l) 2 




To bound the expression in (4.6), note that, given r > 1, a function of the 
form i/j(b) = b r ~ 1 ( 1 — b) is maximized in 0 < 6 < 1 at 6 = 1 — £ and its maxi¬ 
mum (1 — b) r ^r[ can be bounded by e ^_-^ ■ Applying this observation with 
b = \z | 2 and r = |, we conclude that 


(4.8) 


k- l._ lfc _ 2 ,, _ 2 2 A; — 1 1 1 

-Hr (1-H <-<-, 

n —1 efc-2n-1 n—1 


where we used the assumption k> 5. 

Next, to bound the expression in (4.7), consider a function of the form 
ip{b) = (6 + e) r_1 (l — b) with e = r > 1 + e. It is maximized in 0 < b < 1 
at 6=1 — HiA anc [ Rs maximum (1 + e) r (l — y) r ^j can be bounded by 
((1 + e) r )/{e(r — 1)). In particular, with 6 = \z\ and r = k — 2 this yields 


(k — l)(fc ~ 2) . 2 

2(n — l) 2 \ fi, 

^ (fc — l)(fc — 2) 

— e(fc — 3)(n — l) 2 


k—3 

( 1 - 



*l)(l + l*l) 


fc -2 


Hence, using (1 + |-) fc_2 < (1 + ^) n ~ 2 < e 2 /(l + ^) 2 and = k + 

£^3 < re + 2, we can estimate the expression in (4.7) by Together 

with (4.8), the left-hand side of (4.6) is thus bounded by ^_y. Thus, returning 
to (4.3), we obtain a more precise recursive inequality than (4.4): 

4 

(4:-9) A n k < A n _i fc_i H r, ti > A: > 5. 

n — 1 


Finally, applying (4.9) A; — 4 times and the obtained estimate (4.5), we get 


(4.10) 




+ 


n — 1 n — 2 


+ ••• + 


n — k + 4 


+ A,_ 


n— fc+4,4 H 


4(fe — 1) 
n — k + 3 


In the case k < ^ + 1, we have n — k + 3 > | n, and (4.10) yields the desired 
estimate (4.2). In the other case there is nothing to prove since then > 

2 > A Ut k. Proposition 4.1 is proved. □ 
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5. Theorem 1.2 and its generalization. As before, let X \,..., X n be ran¬ 
dom variables that satisfy the orthogonality condition (1.1) and let 1 < k < 
n. Now we are prepared to study asymptotic properties of the average char¬ 
acteristic function / defined in (4.1) (of the average distribution function 
F). Given uj £ 12, introduce random characteristic functions 


f 1 ( HX tl (lls) ^ (i 


itX ik {u ) 

\fk 


9wit) = 


exp(itXi(co)) + • • • + exp(itX n (uj)) 


n 


where summation runs over all increasing sequences 1 < i\ < ■ ■ • < < n. 

Thus, f(t) = E fu(t). Also put 


g(t) = Eg UJ (t)=E 


exp (itXi) + • • • + exp (itX n ) 


n 


By Proposition 4.1, we always have \fu>(t) — gu(t )| < , 

equality must hold for corresponding means, that is, 

6 k 


, t £ R. 

6fc so a similar in- 


(5.1) 


\f{t)-g{t)\ < 


t 6 R. 


n 


Hence, when k = o(n), the associated distribution functions must be also 
close to each other and we may concentrate on the asymptotic behavior of 
9, only. 

A probabilistic meaning of each function g u is very simple. Indeed, given 
lo £ 12, let Y\,...,Y k be independent, identically distributed random vari¬ 
ables defined on some probability space (M, Q), whose common distribution 
is a sample distribution: 

Q{Y l = Xj(u)} = -, 1 <j<n. 

n 

Then, by the very definition, g u represents the characteristic function of the 
random variable 


T u 


Y 1 + --- + Y k 
\Jk 


It has Q mean E qT u = \fkX{uj), where X(ui) = ^ Y0j =l Ay (cu) is just a sam¬ 
ple mean associated to the “sample” X \,..., X n , and has Q variance 


(5.2) 


<r 2 (o;) = VarQ(T^) 


i n _ 

~E( X j( u)-X{u>))\ 

Tl . 

3 =1 


representing the usual sample variance. For simplicity, in some places we 
omit uj, hoping this does not lead to confusion. 
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By the canonical central limit theorem, the random variable has a 
distribution function, G u , which is close to the normal N(VkX , cr 2 ). Hence, 
the distribution function G(x) = E G u ,(x), associated to the characteristic 
function g, is close to a P mixture of N(y/kX, a 2 )-distribution functions. 
Clearly, this mixture can be described as the distribution function of a ran¬ 
dom variable of the form 

f = VkX + a C, 

where £ is a standard normal random variable independent of all r.v.’s Xj. 
It has characteristic function 

(5.3) h(t ) = Ee*^ = Eexp (VkXit — a 2 t 2 /2). 


Lemma 5.1. 


If E Xj = 0, E|Xj| 3 < j3 (1 < j < n ), then 


'■j 

sup MAM<3 

t> 0 t 


+ 6 - 


n J 


k 1 / 8 ' 


Let H denote the distribution function of f. By Bohman’s inequality (3.5) 
applied to F\ = F and F 2 = H, we get 


L 2 (F,H)< 6 


k\ l > 2 P 1 / 4 

> 4.1 2 -_ 

+ fcV8- 


n 


) 


The quantity on the right-hand side is small once k is large and ^ is small. 
In this case, E (s/kX) 2 = j is small, as well, and according to (5.3), h(t) is 
close to the characteristic function 


3 =1 


Thus, we arrive at the following conclusion that includes the statement of 
Theorem 1.2. Let [Xj)JL] be a sequence of random variables that satisfies 


the correlation condition (1.1) and such that E Xj = 0, supjE|X,-| 
Assume that, for some random variable R > 0, as n —> 00 , 


< + 00 . 


1 


n 


3 = 1 


R 2 


in the sense of the weak convergence of distributions on the real line. Let 
<!>/? denote the distribution function of the random variable Rf, where £ is a 
standard normal random variable that is independent on R. Then we have: 


Theorem 5.2. For all ( k,n ) in the range 1 <S k <C n, for every 6 > 0 
and for all r £ Q n .k except for a set of p measure at most Ckf/ 4 exp(—ck5 8 ), 
we have 


L(F T1 $ R )<5 + o( 1). 
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Proof of Lemma 5.1. We use the following standard estimate (needed 
for the Berry-Esseen theorem; cf., e.g., (year?), Chapter V, paragraph 2, 
Lemma 1): If Z \,..., Zk are independent r.v.’s such that E Zi = 0, E| Zi\ 3 < oo 
and B = Ya-i EZ 2 , then 


Eexp 


it(Z\ + • • • + Zjf) 
7 B 


exp 


-r 

~2 


< 16L|t| 3 exp 


-r 

"T 


\t\ < 


4 V 


where L = B 3 / 2 Yi =l E|Z;| 3 (the so-called Lyapunov fraction). Dividing by 
t and maximizing the right-hand side over all t > 0, we get 


(5.4) 


E exp(it(Zi H-1 - Z k )/ Y~B) 

t 


exp(-f 2 /2)| 


< 18L, 


provided that 0 < t < In the case t > the left-hand side can be 
estimated by | < 8 L, so (5.4) holds for all t > 0. In particular, if the Zi s 
are identically distributed with EZ 2 = a 2 and E|Zi| 3 = /3, then B = a 2 k, 
L = (3/a 3 Yk, and the above bound yields 


Eexp(zf(Zi -I-b Zk) Yk) 

max- 

t> o t 


exp(—(T 2 t 2 /2)|_. 18 

~Y%^' 


In particular, this inequality can be applied on the probability space (M, Q) 
to random variables Z[ =Y{ — X. In this case, a 2 = cr 2 (u;) represents the 
sample variance (5.2) and similarly f3{io) = Y T j=i \Xj — X\ 3 . Thus, intro¬ 
ducing the characteristic function h^it) = exp (YkXit — a 2 t 2 /2), we obtain 
that 


(5.5) 


1 9u(t) ~ K(t)\ 

sup- 

t> 0 t 


< 


18 p{w) 
Yk cr 3 


Note that both g^ and h u correspond to distributions with expectation 
YkX and variance a 2 . Hence, by Taylor’s expansion around zero, \gu,(t) — 
hu(t) | < <j 2 t 2 for all t £ R. On the other hand, we always have a trivial 
bound \gu,(t) — hu(t) \ <2. Combining these, we get 


1 9u>(t) ~ K(t) | 


< min-| a 2 t, — \ < Y^a, t > 0. 


Together with (5.5) and maximizing over cr > 0, this gives, for all t > 0, 


1 9w(t) - h u (t)\ 


t 


< min< 


f 18 (3(u) 
X Yk a 3 


, Y2a \ < 


k i/s 


T / 4 


Averaging over uj and using Holder’s inequality, we obtain that 
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since h(t) = Eh u (t). To estimate E/3(cu), we may apply Jensen’s inequality, 
implying \Xj - X | 3 < ± J2?=i \ x j - x i? ■ Since E l x j ~ x l \ 3 < 4E|Xj | 3 + 
4E|X /[ 3 < 8/3, we arrive at E |Xj — A "| 3 < 8(3 and, therefore, E 0(ui) < 8/3. 
Hence, 


(5.6) 


I g(t)-h(t)\ 6/3 1 / 4 

— i —-w 


It remains to involve the characteristic function /. Combining (5.1) and 

(5.6) , we get 

(5.7) |/(t) - + t> 0. 

On the other hand, E£ = 0 and, by independence of £ and (Xi,... ,X n ), 

E£ 2 = E(VkX + a() 2 = kE(X) 2 + Eu 2 = 1 + —- < 2, 

n 


so h'{ 0) = 0 and \h"(t)\ < 2 for all t G R. In addition, the distribution func¬ 
tion F has mean 0 and variance 1, so /'(0) = 0 and \ f'(t)\ < 1. Consequently, 
by Taylor’s expansion around zero, < y, t > 0. Together with (5.7), 

the latter gives 


!/(*) ~ M*)l 

t 


3 . f 4fc 

< — mm< t, - 1 - 

“2 \ ’ nt 


4/3 1 / 4 

A 4 / 8 


Finally, let us note that, given a, b > 0, a function of the form u(t) = min{f | + 
a} attains its maximum at to = (a + V a 2 + 46)/2 and, at this point, u(to) = 
to < a + \/&. Applying this to b = ^ and a = 4/3 1 / 4 /A: 1 / 8 , we arrive at 


supizimu 

t>0 t 



4/3 1 / 4 ! 

A ; 1 / 8 


Lemma 5.1 and, therefore, Theorem 5.2 are proved. □ 


6. Exchangeable random variables. Random variables Xi,...,X/ c are 
called exchangeable (or interchangeable) if the distribution of the ran¬ 
dom vector X = (Xi,..., Xfc), as a measure on R fc , is invariant under permutations 
of coordinates. A similar definition applies in the case of an infinite se¬ 
quence {Xk}^ =1 . In particular, for all k > 1, the distributions of the nor¬ 
malized sums (Xjj + • • • + Xi k )/Vk do not depend on the choice of indices 
i\ <•••<*£. So let 
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Given that 

(6.1) EXi = 0, EX? = 1. 

a well-known theorem due to Blum, Chernoff, Rosenblatt and Teicher [4] 
asserts that Sk —> N( 0,1) weakly in distribution as k —> oo if and only if 

(6.2) EX { X 2 = 0, E X?Xf = 1; 

that is, cov(Xi,X 2 ) = cov(X?,X|) = 0. Moreover, Berry-Esseen’s bound 

(6.3) sup \P{S k < x] - $(s)| < , 

ieR v k 

with some universal c, extends from the i.i.d. case to this case as well. 

Weaker assumptions than (6.1) and (6.2) with different normalization of 
the sums may also lead to asymptotic normality (see, e.g., [22, 24]). How¬ 
ever, less seems to be known in the case of finite sequences of exchangeable 
variables. A basic tool that allows study of the various properties of an in¬ 
finite exchangeable sequence X = {X/ c }^ =1 is de Finetti’s representation of 
the distribution P j of X as a mixture 

(6.4) P x = / dn(a) 

J n 

of product probability measures <g> p Q ® ■ ■ • on R°°. Here (n,7r) is 

some probability space and {/i a } a£ n is some family of marginals with the 
property that functions a—> are 7r-measurable for all Borel sets B on 

the real line. In terms of this representation and assuming (6.1) is fulfilled, 
the central limit theorem Sk —> X(0,1) holds true if and only if the measures 
fjL a have mean 0 and variance 1 for 7r-almost all a ([4], Lemma 1). The latter 
is also characterized directly in terms of X in the form (6.2). 

In the general case of a finite exchangeable sequence X = (Xi,..., X^), 
the finite-dimensional analogue of (6.4), 

(6.5) P X (B) = J fi k a (B)d7r{a), B C R fc , 

where //(( = /i a X ■■■ X are product probability measures on R fc , is no 
longer valid and, in fact, the class of distributions on R^' invariant under per¬ 
mutations of coordinates is much wider. Therefore, it is natural to associate 
to X a maximum natural number n = n(X) such that, for some exchange¬ 
able sequence Xi,...,X n defined perhaps on ajiifferent probability space, 
the random vectors (A'i,..., X&) and (Xi,..., X*,) are equidistributed. If n 
can be chosen as large as we wish or, equivalently, if Px admits representa¬ 
tion (6.5), put n(X) = oo. 

It may occur that X has no exchangeable extension: n(X) = k. In that 
case, it is hardly possible to reach asymptotic normality of the normalized 
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sum Sk, even under moment assumptions such as (6.1) and (6.2). However, 
when n(X) k, the situation changes considerably. In view of de Finetti’s 
theorem, it seems natural to expect in this case that Px has to be close in 
some sense to the class Mk of mixtures of product probability measures on 
R fc . That is, there should hold an approximate equality in (6.5). In terms of 
the variational distance || • |tv between probability measures, this question 
was studied by Diaconis and Freedman [16]. It was shown, in particular that, 
for some Q in A 4k, 


1 

( 6 . 6 ) »||Px - Q||t V <1- '-jr, n = n(X), 

2 n K 

and that the bound cannot be improved. Actually, if an exchangeable ex¬ 
tension X\,... ,X n exists on the same probability space (H,P), we can take 
Q(B) = J dP(co), that is, with 


n = n,7r = p, 


-^ SXn(u) 

Hu: = -—-— 

n 


Under the product measures, the distribution of the function x —> (x\ + 
■■■ + Xk)/Vk is nearly normal (under proper moment conditions), so the 
inequality (6.6) can be used to study the asymptotic normality of Sf.. How¬ 
ever, as emphasized in [16], the expression on the right-hand side in ( 6 . 6 ) 
is of order k 2 /n for k = o(y / n), while it is of order 1 for larger values of k. 
Hence, only the range k = 0(y/n) can be taken into consideration or other 
metrics that better react on the weak convergence of distributions should 
be examined in the case k > 0{y/n). In part concerning half-spaces of the 
form B = {i£ R fc : x\ + • • • + Xk < c}, the closeness of Pj(S) to Q(B) can 
be estimated by virtue of Proposition 4.1. As a consequence, we can derive: 


Proposition 6.1. Let X = (X\,... ,Xk), k > 2, be an exchangeable se¬ 
quence that satisfies the moment hypotheses (6.1) and (6.2). Then 


(6.7) 


sup |P{Sfc < x} — 4>(x)| < c 
igR 



(EjXil 4 ) 1 / 6 ' 

ki 


for some universal c > 0 and p,q> 0 . 


Although this is not as sharp as (6.3), we can still control closeness to 
normality for finite sequences under the same hypotheses. The assumption 
EA 4 < Too is technical and can be a little relaxed (to the third moment, 
e.g.). The second assumption in (6.2) can be weakened to EXfXf < 1. Al¬ 
though a strict inequality is impossible here for infinite exchangeable se¬ 
quences, it does hold for some interesting finite exchangeable sequences (cf., 
e.g., (year?)). 
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Proof of Proposition 6.1. Let X have an exchangeable extension 
Xi,..., X n on (fi, P). By exchangeability, F(x) = P{Sk < x} represents the 
average distribution function (1.1), and its characteristic function / appears 
in (4.1). Note that, under the measure Q(B ) = f fj,^(B)d P(w), the function 
x —► {x\ + • • • + Xk)/Vk has distribution G considered along the proof of 
Theorem 5.2. Moreover, by Lemma 5.1 and Holder’s inequality, 


sup !M1 

t>0 t 



1/2 | 6 (ex 1 4 ) 1 / 4 


where we recall that h(t) = Eexp (y/kXit — a 2 t 2 /2 ) represents the character¬ 
istic function of £ = \fkX + cr£ with £ € N( 0,1) independent of (X\ ,..., X n ). 
By Bohman’s inequality (5.3) and using \/a + b < y/a + Vb (a, b > 0), we may 
write down a bound on the Levy distance, 

( 6 . 8 ) 

for the associated distribution functions. Note that we have used the as¬ 
sumptions EX] = EX 1 X 2 = 0 and EX 2 = 1 in this step. 

To quantify closeness of the distribution function H to <b, we write £ = 
£ + 77 with a small “error” r) = \[kX + (a — 1)£. We apply the following 
general observation: For all random variables £ and 77 , 

(6.9) L(F (+n ,F(;) < (Er? 2 ) 1 / 3 , 

where F^ +r) and F^ are corresponding distribution functions. Indeed, there 
is nothing to prove if 6 = (E 77 2 ) 1 / 3 >1. In the other case, since for all x G R 
and h > 0, 


{£ < = {£ < x, 77 < h} U {£ < x, 77 > h} C {£ + rj < x + h} U {q > h}, 

by Chebyshev’s inequality, we get Fq(x) < F^ +r/ (x + h) + E if/h 2 . Applying 
the latter to another couple of random variables (£ + 77 , — 77 ) and to x — h in 
the place of x, we also have F^ +r/ (x — h)< Fq(x) + E rj 2 /h 2 . All this together 
with h = 5 yields 

F^ +V (x — S) — S< F^(x) < F^ +V (x + h) + d , 
which is exactly (6.9). 

Thus, returning to our specific random variables (£, 77 ), since F^ = H and 
F^ = 3>, we may conclude that 

(6.10) < (E 77 2 ) 1 / 3 . 

Now, since EX 1 X 2 = 0 and EX 2 = 1, we have E if = E (yfkX ) 2 + E(cr — 

l ) 2 = ^i + 2E(l-a). Also note 1 — <7 = (1 — <r 2 )/( 1 + a) = (1 - X 2 + (X") 2 )/(l + a), 
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so 11 — cr| < \X 2 — 1| + ( X ) 2 , where X~ = ^X)” =1 X|. By the assumption 
E X\X\ = 1 and since EX 2 = 1, 

E|X 2 - 1| 2 = Var(X 2 ) = M 1 + "(~-l)cov(X?.X|) = EXf 

n n n 


Therefore, E|X 2 - 1| < (EXf) 1 / 2 /^ and E|1 - <r| < ( EXf) l / 2 /^n + 
Thus, we get Eif < (k + l)/n + (2(EA" 4 ) 1 / 2 )/- v /n and by (6.10), 


L{H, $) < 2( — I 


fcy / 3 + 2 ( ei 1 4 ) 1 / 6 


n 


1/6 


Combining this with (6.8) and making use of k < n and (EX) 1 ) 1 / 8 < (EX 4 ) 1 / 6 
(since the fourth moment is greater than or equal to 1), we obtain that 


L(F,$)< Cl (^ 


1/4 


+ C 2 


(EX 4 ) 1 / 6 

Ad / 16 


Finally, we always have \\F — THoo ^ 2L(E, $), so we arrive at (6.7) with 
p = |, q = jh. This completes the proof. □ 
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